This lecture discusses how to leverage the plotly R package to create a variety of interactive graphics.
There are two main ways to creating a plotly object:
ggplotly()) into a plotly objectplot_ly()/plot_geo()/plot_mapbox().Both approaches have somewhat complementary strengths and weaknesses, so it can pay off to learn both approaches.
Moreover, both approaches are an implementation of the Grammar of Graphics and both are powered by the JavaScript graphing library plotly.js, so many of the same concepts and tools that you learn for one interface can be reused in the other.
plot_ly()Any graph made with the plotly R package is powered by the JavaScript library plotly.js.
The plot_ly() function provides a ‘direct’ interface to
plotly.js with some additional abstractions to help reduce typing.
These abstractions, inspired by the Grammar of Graphics and ggplot2, make it much faster to iterate from one graphic to another, making it easier to discover interesting features in the data.
A rich gallery of examples is provided in: https://plotly.com/r/
Using plot_ly() to explore the diamonds
dataset from ggplot2.
# load the plotly R package:
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# load the diamonds dataset from the ggplot2 package:
data(diamonds, package = "ggplot2")
diamonds
## # A tibble: 53,940 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # ℹ 53,930 more rows
If we assign variable names (e.g., cut,
clarity, etc.) to visual properties (e.g., x,
y, color, etc.) within plot_ly(),
it tries to find a sensible geometric representation of that information
for us.
Examine cut:
str(diamonds$cut)
## Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
head(diamonds$cut, 30)
## [1] Ideal Premium Good Premium Good Very Good Very Good
## [8] Very Good Fair Very Good Good Ideal Premium Ideal
## [15] Premium Premium Ideal Good Good Very Good Good
## [22] Very Good Very Good Very Good Very Good Very Good Premium Very Good
## [29] Very Good Very Good
## Levels: Fair < Good < Very Good < Premium < Ideal
What can be a “sensible” representation when we assign the values of
cut to the x property?
plot_ly(diamonds, x = ~cut)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
Note: the plot appears in the Viewer tab in RStudio, not
in the Plots tab.
Examine clarity:
head(diamonds$clarity, 30)
## [1] SI2 SI1 VS1 VS2 SI2 VVS2 VVS1 SI1 VS2 VS1 SI1 VS1 SI1 SI2 SI2
## [16] I1 SI2 SI1 SI1 SI1 SI2 VS2 VS1 SI1 SI1 VVS2 VS1 VS2 VS2 VS1
## Levels: I1 < SI2 < SI1 < VS2 < VS1 < VVS2 < VVS1 < IF
What can be a “sensible” representation when we assign the values of
cut to the x property and the values of
clarity to the y property?
plot_ly(diamonds, x = ~cut, y = ~clarity)
## No trace type specified:
## Based on info supplied, a 'histogram2d' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram2d
(Note the output in the console.)
What can be a “sensible” representation when we assign the values of
cut to the x property and the values of
clarity to the color property?
plot_ly(diamonds, x = ~cut, color = ~clarity)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
The plot_ly() function has numerous arguments that are
unique to the R package (e.g., color, stroke,
span, symbol, linetype, etc.) and
make it easier to encode data variables (e.g., diamond clarity) as
visual properties (e.g., color).
plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent")
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
In the last example, color is used to map each level of
diamond clarity to a different color, then colors is used
to specify the range of colors (which, in this case, the “Accent” color
palette from the RColorBrewer package, but one can also
supply custom color codes or a color palette function like
colorRamp()).
Try:
plot_ly(diamonds, x = ~cut, color = "black")
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Since these arguments map data values to a visual range by default, you will obtain unexpected results if you try to specify the visual range directly.
If you want to specify the visual range directly, use the
I() function to declare this value to be taken ‘AsIs’:
plot_ly(
diamonds,
x = ~cut,
color = I("red"),
stroke = I("black"),
span = I(5)
)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
A good resource to learn more about these arguments (especially their
defaults) is the R documentation page available by entering
help(plot_ly) in your R console.
The plotly package takes a purely functional approach to a layered grammar of graphics: (almost) every function anticipates a plotly object as input to its first argument and returns a modified version of that plotly object.
For example, the layout() function anticipates a
plotly object in its first argument and its other
arguments add and/or modify various layout components of that object
(e.g., the title):
layout(
plot_ly(diamonds, x = ~cut),
title = "My beautiful histogram"
)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
For more complex plots that modify a plotly graph many times over, code written in this way can become cumbersome to read.
The %>% operator simplifies this by placing the
object on the left-hand side of the %>% into the first
argument of the function of the right-hand side:
diamonds %>%
plot_ly(x = ~cut) %>%
layout(title = "My beatiful histogram")
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
In addition to layout() for adding/modifying part(s) of
the graph’s layout, there are also a family of add_*()
functions (e.g., add_histogram(), add_lines(),
etc.) that add a graphical layer to a plot.
A layer can be thought of as a group of graphical elements
that can be sufficiently described using only 5 components: data,
aesthetic mappings (e.g., assigning clarity to
color), a geometric representation (e.g., rectangles,
circles, etc.), statistical transformations (e.g., sum, mean, etc.), and
positional adjustments (e.g., dodge, stack, etc.).
In the examples thus far, we have not specified a layer. The layer
has been added for us automatically by plot_ly().
To be explicit about what plot_ly(diamonds, x = ~cut)
generates, we should add a add_histogram() layer:
add_histogram(plot_ly(diamonds), x = ~cut)
Exercise: Rewrite the above line using two pipe operators
As we’ll discuss later, plotly has both
add_histogram() and add_bars(). The difference
is that add_histogram() performs statistics (i.e.,
a binning algorithm) dynamically in the web browser, whereas
add_bars() requires the bar heights to be pre-specified.
That means, to replicate the last example with add_bars(),
the number of observations must be computed ahead of time.
Find out the arguments that are required for add_bars()
and use the dplyr::count function to make a bar plot that
is, in fact, a histogram.
There are numerous other add_*() functions that
calculate statistics in the browser (e.g.,
add_histogram2d(), add_contour(),
add_boxplot(), etc.), but most other functions aren’t
considered statistical.
Generally speaking, non-statistical layers will be faster and more responsive at runtime (since they require less computational work), whereas the statistical layers allow for more flexibility when it comes to client-side interactivity.
In many scenarios, it can be useful to combine multiple graphical
layers into a single plot. In this case, it becomes useful to know a few
things about plot_ly():
plot_ly() are global,
meaning that any downstream add_*() functions inherit these
arguments (unless inherit = FALSE).data underlying a
plotly object.Technically speaking, these dplyr verbs are S3 generic functions that have a plotly method. In nearly every case, that method simply queries the data underlying the plotly object, applies the dplyr function, then adds the transformed data back into the resulting plotly object.
Using these two properties of plot_ly(), we can (for
example):
cut to x.x from
plot_ly()).data
underlying the plotly object. Here we just count the
number of diamonds in each cut category.x mapping, as well as the other mappings local to
this text layer (text and y), reflects data
values from step 3.Complete the following code so that it will accomplish the above 4 steps:
library(dplyr)
(p1 <-
diamonds %>%
plot_ly( "" ) %>%
add_histogram( "" ) %>%
dplyr::""( "" ) %>%
summarise(n = "") %>%
add_text(
text = ~scales::comma(n), y = ~n,
textposition = "top middle",
cliponaxis = FALSE
)
)
Before using multiple add_*() in a single plot, make
sure that you actually want to show those layers of information on the
same set of axes.
When using dplyr verbs to modify the
data underlying the plotly object, you can
use the plotly_data() function to obtain the data at any
point in time, which is primarily useful for debugging purposes.
What are the differences between the following
plotly objects and data extracted using
plotly_data()?
(plotly_obj1 <-
diamonds %>%
plot_ly(x = ~cut) %>%
add_histogram())
# to:
(plotly_dat1 <-
diamonds %>%
plot_ly(x = ~cut) %>%
add_histogram() %>%
plotly_data())
## # A tibble: 53,940 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # ℹ 53,930 more rows
And:
(plotly_obj2 <-
diamonds %>%
plot_ly(x = ~cut) %>%
add_histogram() %>%
group_by(cut) %>%
summarise(n = n()))
# To:
(plotly_dat2 <-
diamonds %>%
plot_ly(x = ~cut) %>%
add_histogram() %>%
group_by(cut) %>%
summarise(n = n()) %>%
plotly_data())
## # A tibble: 5 × 2
## cut n
## <ord> <int>
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
And:
(plotly_obj3 <-
diamonds %>%
dplyr::count(cut) %>%
plot_ly() %>%
add_bars(x = ~cut, y = ~n))
# To:
(plotly_dat3 <-
diamonds %>%
dplyr::count(cut) %>%
plot_ly() %>%
add_bars(x = ~cut, y = ~n) %>%
plotly_data())
## # A tibble: 5 × 2
## cut n
## <ord> <int>
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
The above introduction to plot_ly() has mainly focused
on concepts unique to the R package plotly that are
generally useful for creating most kinds of data views.
The next section outlines how plotly generates plotly.js figures and how to inspect the underlying data structure that plotly.js uses to render the graph.
Not only is this information useful for debugging, but it’s also a nice way to learn how to work with plotly.js directly, which you may need to improve performance in shiny apps and/or for adding custom behavior with JavaScript.
When you print any plotly object, the
plotly_build() function is applied to that object, and that
generates an R list which adheres to a syntax that plotly.js
understands.
This syntax is a JavaScript Object Notation (JSON) specification that plotly.js uses to represent, serialize, and render web graphics.
The following figure shows how this workflow applies to a simple bar graph (with values directly supplied instead of a data column name reference, but the same concept applies for any graph created via plotly.
knitr::include_graphics("printing.png")
Run the following in RStudio:
plot_ly(diamonds, x = ~cut)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
and check in which formats you can export it from the
Viewer tab. Compare to the formats you can export the
following standard plot from the Plots tab:
ggplot(diamonds) +
geom_bar(aes(x = cut))
A lot of documentation is available online about plotly (e.g., the online reference) implicitly refers to this JSON specification, so it can be helpful to know how to “work backwards” from that documentation (i.e., translate JSON into to R code).
Recall our plot form the previous exercise:
(p1 <-
diamonds %>%
plot_ly( "" ) %>%
add_histogram( "" ) %>%
dplyr::""( "" ) %>%
summarise(n = "") %>%
add_text(
text = ~scales::comma(n), y = ~n,
textposition = "top middle",
cliponaxis = FALSE
)
)
The legend in this plot is really redundant. How do we get rid of it? First, try with ?plot_ly - this will show the basic functionality of plotly objects in R.
Second, try ?add_trace
Third, try: https://plotly.com/r/reference/
Example:
How do we change the background color of the modebar to red? (notice the hierarchical structure):
p1 %>% layout(modebar = list(bgcolor = "red"))
trace0 text in the tooltip for the
histogram. Get rid of the tooltip for the text layer entirely. (hint:
hoverinfo).hoverlabel). Note:
there are several ways to do this - what are the differences?As the diagram suggests, both the plotly_build() and
plotly_json() functions can be used to inspect the
underlying data structure on both the R and JSON side of things.
For example, the following shows the data portion of the
JSON created for the plot stored in p:
(p <- plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent"))
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
plotly_json(p)
Which is different from:
p
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
In plotly.js terminology, a figure has two key components:
data (aka, traces) and a layout.
Every trace has a type (e.g., histogram, pie, scatter, etc.)
and the trace type determines what other attributes (i.e., visual and/or
interactive properties, like x, hoverinfo,
name) are available to control the trace mapping.
That is, not every trace attribute is available to every trace type,
but many attributes (e.g., the name of the trace) are
available in every trace type and serve a similar purpose.
A trace defines a mapping from data and visuals.
A trace is similar in concept to a layer, but it’s not quite the same. In many cases , as in
plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent")
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
it makes sense to implement a single layer as multiple traces. This is due to the design of plotly.js and how traces are tied to legends and hover behavior.
Inspecting the JSON object, we see that it takes 8 traces to generate the dodged bar chart.
plotly_json(p)
Instead of clicking through JSON viewer, sometimes it’s easier to use
plotly_build() and compute on the plotly.js figure
definition to verify certain things exist.
Since plotly uses the htmlwidgets
standard, the actual plotly.js figure definition appears under a list
element named x.
(The htmlwidgets package provides a foundation for other packages to implement R bindings to JavaScript libraries so that those bindings work in various contexts (e.g., the R console, RStudio, inside rmarkdown documents, shiny apps, etc.).
Use plotly_build() to get at the plotly.js definition
behind any plotly object:
b <- plotly_build(p)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#histogram
Again, notice that
str(b) # the build object
is somewhat different from
str(p) # the plotly object
Confirm that there 8 traces:
length(b$x$data)
## [1] 8
Extract the name of each trace. plotly.js uses
name to populate legend entries and tooltips:
purrr::map_chr(b$x$data, "name")
## [1] "IF" "VVS1" "VVS2" "VS1" "VS2" "SI1" "SI2" "I1"
Every trace has a type of histogram:
purrr::map_chr(b$x$data, "type")
## [1] "histogram" "histogram" "histogram" "histogram" "histogram" "histogram"
## [7] "histogram" "histogram"
Here we’ve learned that plotly creates 8 histogram
traces to generate the dodged bar chart: one trace for each level of
clarity. Although the x-axis is discrete, plotly.js still
considers this a histogram because it generates counts in the
browser.
Why one trace per category?
Answer: to populate a tooltip and legend entry for each level of
clarity level. To allow hiding categories.
Notice how the trace name that was redundant in the above example is now useful.
If we investigated further, we’d notice that color and
colors are not officially part of the plotly.js figure
definition; they are arguments of the plot_ly function in
R.
The plotly_build() function has effectively transformed
that information into a sensible plotly.js figure definition (e.g.,
marker.color contains the actual bar color codes).
In fact, the color argument in plot_ly() is
just one example of an abstraction the R package has built on top of
plotly.js to make it easier to map data values to visual attributes.
The ggplotly() function from the plotly package has the ability to translate ggplot2 to plotly. This functionality can be really helpful for quickly adding interactivity to your existing ggplot2 workflow.
Moreover, even if you know plot_ly() and plotly.js well, ggplotly() can still be desirable for creating visualizations that aren’t necessarily straight-forward to achieve without it.
Let’s explore the relationship between price and other variables from
the diamonds dataset.
Hexagonal binning (i.e., geom_hex()) is useful way to
visualize a 2D density (see, e.g., https://www.meccanismocomplesso.org/en/hexagonal-binning-a-new-method-of-visualization-for-data-analysis/),
like the relationship between price and
carat.
(p <-
ggplot(diamonds, aes(x = log(carat), y = log(price))) +
geom_hex(bins = 100))
We can see there is a strong positive linear relationship between the log of carat and price. It also shows that for many, the carat is only rounded to a particular number (indicated by the light blue bands) and no diamonds are priced around $1500.
Making this plot interactive makes it easier to decode the hexagonal colors into the counts that they represent:
ggplotly(p)
ggplotly() is effective in leveraging
ggplot2’s consistent and expressive interface for
exploring statistical summaries across groups.
For example, by including a discrete color variable
(e.g., cut) with geom_freqpoly(), you get a
frequency polygon for each level of that variable.
This ability to quickly generate visual encodings of statistical
summaries across an arbitrary number of groups works for basically any
geom (e.g., geom_boxplot(), geom_histogram(),
geom_density(), etc.) and is a key feature of
ggplot2.
(p <-
ggplot(diamonds, aes(x = log(price), color = clarity)) +
geom_freqpoly())
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplotly(p)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Now, to see how price varies with both cut
and clarity, we could repeat this same visualization for
each level of cut.
This is where ggplot2’s facet_wrap()
comes in handy. Moreover, to facilitate comparisons, we can have
geom_freqpoly() display relative rather than absolute
frequencies.
(p <-
ggplot(diamonds, aes(x = log(price), color = clarity)) +
geom_freqpoly(stat = "density") +
facet_wrap(~cut))
By making this plot interactive, we can more easily compare particular levels of clarity by leveraging the legend filtering capabilities.
ggplotly(p)
Play with the above plot - what do you like about this? What do you think is potentially limiting?
In addition to supporting most of the ‘core’ ggplot2
API, ggplotly() can automatically convert any
ggplot2 extension packages that return a ‘standard’
ggplot2 object.
“Standard” means that the object is comprised of ‘core’ ggplot2 data structures and not the result of custom geoms.
(ggplotly() can actually convert custom geoms as well,
but each one requires a custom hook, and many custom geoms are not yet
supported.)
Some great examples of R packages that extend ggplot2 using core data structures are ggforce, naniar, and GGally.
Another way of visualizing the same information found in previous
plot is by using geom_sina() from the
ggforce package (instead of
geom_freqpoly()).
This visualization jitters the raw data within the density for each
group allowing us not only to see where the majority observations fall
within a group, but also across all groups. The second layer of the plot
uses ggplot2’s stat_summary() to overlay a
95% confidence interval estimated via a Bootstrap algorithm via the
Hmisc package.
(p <-
ggplot(diamonds, aes(x = clarity, y = log(price), color = clarity)) +
ggforce::geom_sina(alpha = 0.1) +
stat_summary(fun.data = "mean_cl_boot", color = "black") +
facet_wrap(~cut))
By making this layer interactive, we can query individual points for more information and zoom into interesting regions.
toWebGL(ggplotly(p))
It’s surprising that the diamond price would decline with an increase of diamond clarity.
As it turns out, if we account for the carat of the diamond, then we see that better diamond clarity does indeed lead to a higher diamond price (this is a great example of “Simpson’s paradox”)
Seeing such a strong pattern in the residuals of simple linear model
of carat vs. price indicates that our model could be greatly improved by
adding clarity as a predictor of price:
m <- lm(log(price) ~ log(carat), data = diamonds)
(diamonds <- modelr::add_residuals(diamonds, m))
## # A tibble: 53,940 × 11
## carat cut color clarity depth table price x y z resid
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 -0.199
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 -0.0464
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 -0.196
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 -0.563
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 -0.672
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 -0.240
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 -0.240
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 -0.371
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 -0.0912
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 -0.163
## # ℹ 53,930 more rows
(p <-
ggplot(diamonds, aes(x = clarity, y = resid, color = clarity)) +
ggforce::geom_sina(alpha = 0.1) +
stat_summary(fun.data = "mean_cl_boot", color = "black") +
facet_wrap(~cut))
toWebGL(ggplotly(p))